Modeling Protein-Protein Interactions in Biomedical Abstracts with Latent Dirichlet Allocation

نویسنده

  • David Andrzejewski
چکیده

A major goal in biomedical text processing is the automatic extraction of protein interaction information from scientific articles or abstracts. We approach this task with a topic-based generative model. Under the model, sentences in biomedical abstracts can be generated by either an ’interaction’ topic if they contain or discuss interacting proteins or a ’background’ topic otherwise. This structure is implemented as a Latent Dirichlet Allocation (LDA) model. The model structure was previously developed as part of work with Mark Craven and Jerry Zhu. During this project, parameter inference equations and algorithms were derived. Future work will consist of implementation and experimental testing.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation

Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...

متن کامل

Familia: An Open-Source Toolkit for Industrial Topic Modeling

Familia is an open-source toolkit for pragmatic topic modeling in industry. Familia abstracts the utilities of topic modeling in industry as two paradigms: semantic representation and semantic matching. Efficient implementations of the two paradigms are made publicly available for the first time. Furthermore, we provide off-the-shelf topic models trained on large-scale industrial corpora, inclu...

متن کامل

Latent Dirichlet Allocation with Topic-in-Set Knowledge

Latent Dirichlet Allocation is an unsupervised graphical model which can discover latent topics in unlabeled data. We propose a mechanism for adding partial supervision, called topic-in-set knowledge, to latent topic modeling. This type of supervision can be used to encourage the recovery of topics which are more relevant to user modeling goals than the topics which would be recovered otherwise...

متن کامل

Automatic Summarization for Terminology Recommendation: The Case of the NCBO Ontology Recommender

The National Center for Biomedical Ontology (NCBO) ontology recommender helps users choose a biomedical terminology by analyzing a submitted document. Submitting a single document might not be representative and result in poor recommendations, while submitting a large sample might be expensive, sometimes unfeasible. In this paper, we investigate the effectiveness of two well-researched automati...

متن کامل

Statistical modeling of medical indexing processes for biomedical knowledge information discovery from text

The overwhelming amount of published literature in the biomedical domain and the growing number of collaborations across scientific disciplines results in an increasing topical complexity of research articles. This represents an immense challenge for efficient biomedical knowledge discovery from text. We present a new graphical model, the socalled Topic-Concept Model, which extends the basic La...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006